We first create data. In particular we create a continuous vector:
set.seed(123)
x <- rnorm(n = 300, mean = 10, sd = 5)
Null hypothesis: the mean of x
is equal to 0. We have a
large sample size so we can use the t-test.
t.test(x = x, mu = 0)
##
## One Sample t-test
##
## data: x
## t = 37.258, df = 299, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 9.634917 10.709497
## sample estimates:
## mean of x
## 10.17221
If you check the help page you will see that mu = 0
is
the default option. This means that we can remove this part:
t.test(x = x)
##
## One Sample t-test
##
## data: x
## t = 37.258, df = 299, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 9.634917 10.709497
## sample estimates:
## mean of x
## 10.17221
Output interpretation: The mean sample is 10.11 and the confidence interval is (9.63, 10.71). The p-value is very small so we reject the null hypothesis that the sample mean is equal to 0. The test statistic is 37.26. The test statistic can be obtained using the formula: \(\frac{\bar{x} - \mu_0}{sd(x)\sqrt{n}}\)
test_stat <- mean(x)/(sd(x) / sqrt(300))
Now let’s assume that we want to investigate whether the sample mean is equal to 10:
t.test(x = x, mu = 10)
##
## One Sample t-test
##
## data: x
## t = 0.63074, df = 299, p-value = 0.5287
## alternative hypothesis: true mean is not equal to 10
## 95 percent confidence interval:
## 9.634917 10.709497
## sample estimates:
## mean of x
## 10.17221
In that case the p-value is too large to reject the null hypothesis. The test statistic can be also obtain as:
test_stat <- (mean(x) - 10)/(sd(x) / sqrt(300))
The p-value can be also obtained as:
2 * pt(q = test_stat, df = 300 - 1, lower.tail = FALSE)
## [1] 0.5286913
2 * (1 - pt(q = test_stat, df = 300 - 1, lower.tail = TRUE))
## [1] 0.5286913
By default, a two-sided test is performed. To do a one-sided test,
the argument alternative
can be set to less or greater:
t.test(x, mu = 10, alternative = 'less')
##
## One Sample t-test
##
## data: x
## t = 0.63074, df = 299, p-value = 0.7357
## alternative hypothesis: true mean is less than 10
## 95 percent confidence interval:
## -Inf 10.62269
## sample estimates:
## mean of x
## 10.17221
t.test(x, mu = 10, alternative = 'greater')
##
## One Sample t-test
##
## data: x
## t = 0.63074, df = 299, p-value = 0.2643
## alternative hypothesis: true mean is greater than 10
## 95 percent confidence interval:
## 9.721728 Inf
## sample estimates:
## mean of x
## 10.17221
Furthermore, we can change the confidence interval level using the
argument conf.level
:
t.test(x, mu = 10, alternative = 'less', conf.level = 0.975)
##
## One Sample t-test
##
## data: x
## t = 0.63074, df = 299, p-value = 0.7357
## alternative hypothesis: true mean is less than 10
## 97.5 percent confidence interval:
## -Inf 10.7095
## sample estimates:
## mean of x
## 10.17221
What if we do not want to print the whole output? In that case we can save the test results as an object and then select the parts that we want to print:
test_res <- t.test(x, mu = 10, alternative = 'less', conf.level = 0.975)
test_res$statistic
## t
## 0.6307416
test_res$p.value
## [1] 0.7356544
test_res$null.value
## mean
## 10
Let’s now assume that we only have 30 subjects (small sample size). We first create the data:
set.seed(123)
x <- rnorm(n = 30, mean = 10, sd = 5)
Null hypothesis: the median of x
is equal to 0. We have
a small sample size so we can use the Wilcoxon signed
rank test:
wilcox.test(x = x, mu = 0)
##
## Wilcoxon signed rank exact test
##
## data: x
## V = 465, p-value = 1.863e-09
## alternative hypothesis: true location is not equal to 0
Note that confidence intervals are only returned if
conf.int = TRUE
:
wilcox.test(x = x, mu = 0, conf.int = TRUE)
##
## Wilcoxon signed rank exact test
##
## data: x
## V = 465, p-value = 1.863e-09
## alternative hypothesis: true location is not equal to 0
## 95 percent confidence interval:
## 7.713874 11.621940
## sample estimates:
## (pseudo)median
## 9.680038
The additional argument exact
controls if exact p-values
and confidence intervals are calculated or if the normal approximation
is used. In the latter case, the argument correct
determines if a continuity correction is applied.
wilcox.test(x = x, mu = 0, exact = TRUE)
##
## Wilcoxon signed rank exact test
##
## data: x
## V = 465, p-value = 1.863e-09
## alternative hypothesis: true location is not equal to 0
wilcox.test(x = x, mu = 0, exact = FALSE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: x
## V = 465, p-value = 1.825e-06
## alternative hypothesis: true location is not equal to 0
Specific parts of the output can be also extracted:
test_res <- wilcox.test(x = x, mu = 0, exact = TRUE)
test_res$statistic
## V
## 465
The test statistic \(W_-\) and \(W_+\) can be also obtained as:
res <- rank(abs(x-0)) # take ranks of the absolute differences
sum(res[(x-0)<0]) # sum all negative differences
## [1] 0
sum(res[(x-0)>0]) # sum all positive differences
## [1] 465
We first create data. In particular we create two continuous vectors:
set.seed(123)
x <- rnorm(n = 300, mean = 10, sd = 5)
y <- rnorm(n = 300, mean = 11, sd = 2)
Null hypothesis: the mean of x
is equal to the mean of
y
. Let’s assume that the samples are
independent. We have a large sample
size so we can use the t-test.
t.test(x = x, y = y)
##
## Welch Two Sample t-test
##
## data: x and y
## t = -2.8587, df = 400.49, p-value = 0.004475
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.427798 -0.264225
## sample estimates:
## mean of x mean of y
## 10.17221 11.01822
It is also possible to specify the test using a formula. This is useful when we have the data in a data.frame:
dat <- data.frame(value = c(x, y), group = rep(x = c(1, 2), each = length(x)))
t.test(value ~ group, data = dat)
##
## Welch Two Sample t-test
##
## data: value by group
## t = -2.8587, df = 400.49, p-value = 0.004475
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -1.427798 -0.264225
## sample estimates:
## mean in group 1 mean in group 2
## 10.17221 11.01822
By default, the test assumes that the two samples have different variances. Check the help file for all this information!
t.test(x = x, y = y, var.equal = TRUE)
##
## Two Sample t-test
##
## data: x and y
## t = -2.8587, df = 598, p-value = 0.004401
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.4272165 -0.2648068
## sample estimates:
## mean of x mean of y
## 10.17221 11.01822
F test can be used to check if two samples have the same variance:
var.test(x = x, y = y)
##
## F test to compare two variances
##
## data: x and y
## F = 5.7173, num df = 299, denom df = 299, p-value < 2.2e-16
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 4.555655 7.175163
## sample estimates:
## ratio of variances
## 5.717304
Let’s now assume that the samples are dependent. In
that case we need to set the argument paired = TRUE
:
t.test(x = x, y = y, paired = TRUE)
##
## Paired t-test
##
## data: x and y
## t = -2.7989, df = 299, p-value = 0.005461
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -1.4408460 -0.2511774
## sample estimates:
## mean difference
## -0.8460117
This is equivalent to performing a one-sample t-test of the differences x - y:
t.test(x = x - y)
##
## One Sample t-test
##
## data: x - y
## t = -2.7989, df = 299, p-value = 0.005461
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -1.4408460 -0.2511774
## sample estimates:
## mean of x
## -0.8460117
We can change the mean to test if the difference is different from a value instead of testing for a difference equal to zero.
t.test(x = x, y = y, mu = 10, paired = TRUE)
##
## Paired t-test
##
## data: x and y
## t = -35.883, df = 299, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 10
## 95 percent confidence interval:
## -1.4408460 -0.2511774
## sample estimates:
## mean difference
## -0.8460117
Let’s now assume that we only have 30 subjects (small sample size). We first create the data:
set.seed(123)
x <- rnorm(n = 30, mean = 10, sd = 5)
y <- rnorm(n = 30, mean = 11, sd = 2)
Null hypothesis: the distribution of x
is equal to the
distribution of y
. Let’s assume that the samples are
independent. We have a small sample
size so we can use the Wilcoxon rank sum test:
wilcox.test(x = x, y = y, correct = TRUE, conf.int = TRUE)
##
## Wilcoxon rank sum exact test
##
## data: x and y
## W = 331, p-value = 0.07973
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
## -3.7349469 0.1653389
## sample estimates:
## difference in location
## -1.857596
Check the help page for the correct
argument. Let’s now
assume that the samples are dependent. In that case we
can use the Wilcoxon singed rank test:
wilcox.test(x = x, y = y, paired = TRUE)
##
## Wilcoxon signed rank exact test
##
## data: x and y
## V = 156, p-value = 0.1191
## alternative hypothesis: true location shift is not equal to 0
We first create data. In particular we create three continuous vectors:
set.seed(123)
x <- rnorm(n = 300, mean = 10, sd = 5)
y <- rnorm(n = 300, mean = 11, sd = 2)
z <- rnorm(n = 300, mean = 15, sd = 7)
Null hypothesis: the means of x
, y
and
z
are identical. We have a large sample
size so we can use the anova test.
dat <- data.frame(value = c(x, y, z), group = rep(x = c(1, 2, 3), each = length(x)))
boxplot(value ~ group, data = dat)
test_res <- aov(formula = value ~ group, data = dat)
summary(test_res)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 1 3667 3667 137 <2e-16 ***
## Residuals 898 24032 27
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s now assume that we only have 30 subjects (small sample size). We first create the data:
set.seed(123)
x <- rnorm(n = 30, mean = 10, sd = 5)
y <- rnorm(n = 30, mean = 11, sd = 2)
z <- rnorm(n = 30, mean = 15, sd = 7)
dat <- data.frame(value = c(x, y, z), group = rep(x = c(1, 2, 3), each = length(x)))
Null hypothesis: the distributions of x
, y
and z
are identical. We have a small sample
size so we can use the Kruskal-Wallis test, which is an
extension to the Wilcoxon rank sum test to more than two groups:
# (all following options will provide the same result)
kruskal.test(x = dat$value, g = dat$group)
##
## Kruskal-Wallis rank sum test
##
## data: dat$value and dat$group
## Kruskal-Wallis chi-squared = 17.188, df = 2, p-value = 0.0001852
kruskal.test(x = list(x, y, z))
##
## Kruskal-Wallis rank sum test
##
## data: list(x, y, z)
## Kruskal-Wallis chi-squared = 17.188, df = 2, p-value = 0.0001852
kruskal.test(formula = value ~ group, data = dat)
##
## Kruskal-Wallis rank sum test
##
## data: value by group
## Kruskal-Wallis chi-squared = 17.188, df = 2, p-value = 0.0001852
We first create the data:
set.seed(123)
x <- rnorm(n = 300, mean = 10, sd = 5)
y <- rnorm(n = 300, mean = 11, sd = 2)
Null hypothesis: the variables x
and y
are
independent (no correlation). By default, the Pearson correlation is
assumed.
cor.test(x = x, y = y)
##
## Pearson's product-moment correlation
##
## data: x and y
## t = -1.0496, df = 298, p-value = 0.2947
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.17274411 0.05291402
## sample estimates:
## cor
## -0.06069048
Alternatively, we can obtain the test statistic and p-value using the formula \(\frac{\rho \sqrt{n-2}}{\sqrt{1-\rho^2}}\):
test_stat <- (cor(x, y) * sqrt(300 - 2))/sqrt(1 - cor(x,y)^2)
pVal <- 2 * pt(q = test_stat, df = 300 - 2, lower.tail = TRUE)
pVal
## [1] 0.2947457
Let’s now assume that we only have 30 subjects (small sample size). We first create the data:
set.seed(123)
x <- rnorm(n = 30, mean = 10, sd = 5)
y <- rnorm(n = 30, mean = 11, sd = 2)
Null hypothesis: the variables x
and y
are
independent (no correlation). We have a small sample
size so we can use the Spearman correlation by changing the
method
argument:
# (with the `exact` argument we can select whether we want to perform the exact
# test or the approximate test)
cor.test(x = x, y = y, method = "spearman", exact = FALSE)
##
## Spearman's rank correlation rho
##
## data: x and y
## S = 5030, p-value = 0.531
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.1190211
Other correlation coefficients include the Kendall:
cor.test(x = x, y = y, method = "kendall")
##
## Kendall's rank correlation tau
##
## data: x and y
## T = 199, p-value = 0.5239
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## -0.08505747